110 research outputs found

    Differential meta-analysis of RNA-seq data from multiple studies

    Get PDF
    High-throughput sequencing is now regularly used for studies of the transcriptome (RNA-seq), particularly for comparisons among experimental conditions. For the time being, a limited number of biological replicates are typically considered in such experiments, leading to low detection power for differential expression. As their cost continues to decrease, it is likely that additional follow-up studies will be conducted to re-address the same biological question. We demonstrate how p-value combination techniques previously used for microarray meta-analyses can be used for the differential analysis of RNA-seq data from multiple related studies. These techniques are compared to a negative binomial generalized linear model (GLM) including a fixed study effect on simulated data and real data on human melanoma cell lines. The GLM with fixed study effect performed well for low inter-study variation and small numbers of studies, but was outperformed by the meta-analysis methods for moderate to large inter-study variability and larger numbers of studies. To conclude, the p-value combination techniques illustrated here are a valuable tool to perform differential meta-analyses of RNA-seq data by appropriately accounting for biological and technical variability within studies as well as additional study-specific effects. An R package metaRNASeq is available on the R Forge

    Use of the score test as a goodness-of-fit measure of the covariance structure in genetic analysis of longitudinal data

    Get PDF
    Model selection is an essential issue in longitudinal data analysis since many different models have been proposed to fit the covariance structure. The likelihood criterion is commonly used and allows to compare the fit of alternative models. Its value does not reflect, however, the potential improvement that can still be reached in fitting the data unless a reference model with the actual covariance structure is available. The score test approach does not require the knowledge of a reference model, and the score statistic has a meaningful interpretation in itself as a goodness-of-fit measure. The aim of this paper was to show how the score statistic may be separated into the genetic and environmental parts, which is difficult with the likelihood criterion, and how it can be used to check parametric assumptions made on variance and correlation parameters. Selection of models for genetic analysis was applied to a dairy cattle example for milk production

    Estimation of genetic parameters for test day records of dairy traits in the first three lactations

    Get PDF
    Application of test-day models for the genetic evaluation of dairy populations requires the solution of large mixed model equations. The size of the (co)variance matrices required with such models can be reduced through the use of its first eigenvectors. Here, the first two eigenvectors of (co)variance matrices estimated for dairy traits in first lactation were used as covariables to jointly estimate genetic parameters of the first three lactations. These eigenvectors appear to be similar across traits and have a biological interpretation, one being related to the level of production and the other to persistency. Furthermore, they explain more than 95% of the total genetic variation. Variances and heritabilities obtained with this model were consistent with previous studies. High correlations were found among production levels in different lactations. Persistency measures were less correlated. Genetic correlations between second and third lactations were close to one, indicating that these can be considered as the same trait. Genetic correlations within lactation were high except between extreme parts of the lactation. This study shows that the use of eigenvectors can reduce the rank of (co)variance matrices for the test-day model and can provide consistent genetic parameters

    EM-REML estimation of covariance parameters in Gaussian mixed models for longitudinal data analysis

    Get PDF
    This paper presents procedures for implementing the EM algorithm to compute REML estimates of variance covariance components in Gaussian mixed models for longitudinal data analysis. The class of models considered includes random coefficient factors, stationary time processes and measurement errors. The EM algorithm allows separation of the computations pertaining to parameters involved in the random coefficient factors from those pertaining to the time processes and errors. The procedures are illustrated with Pothoff and Roy's data example on growth measurements taken on 11 girls and 16 boys at four ages. Several variants and extensions are discussed

    A quasi-score approach to the analysis of ordered categorical data via a mixed heteroskedastic threshold model

    Get PDF
    This article presents an extension of the methodology developed by Gilmour et al. [19], for ordered categorical data, taking into account the heterogeneity of residual variances of latent variables. Heterogeneity of residual variances is described via a structural linear model on log-variances. This method involves two main steps: i) a ’marginalization’ with respect to the random effects leading to quasi-score estimators; ii) an approximation of the variance-covariance matrix of the observations which leads to an analogue of the Henderson mixed model equations for continuous Gaussian data. This methodology is illustrated by a numerical example of footshape in sheep.Cet article présente une extension de la méthodologie développée par Gilmour et al. [19] dans le cas de variables qualitatives ordonnées, prenant en compte l’hétérogénéité des variances résiduelles des variables latentes. L’hétérogénéité des variances résiduelles est décrite par un modèle linéaire structurel sur les logarithmes des variances. Cette méthode comprend deux étapes principales : i) une « marginalisation » par rapport aux effets aléatoires qui conduit, grâce aux équations de quasi-score, à l’estimation des paramètres ; ii) une approximation de la matrice de variance-covariance des observations qui aboutit à un système analogue aux équations du modèle mixte d’Henderson dans le cas de variables continues gaussiennnes. Cette méthodologie est illustrée par un exemple sur la forme des pieds chez le mouton

    Genetic analysis of growth curves using the SAEM algorithm

    Get PDF
    The analysis of nonlinear function-valued characters is very important in genetic studies, especially for growth traits of agricultural and laboratory species. Inference in nonlinear mixed effects models is, however, quite complex and is usually based on likelihood approximations or Bayesian methods. The aim of this paper was to present an efficient stochastic EM procedure, namely the SAEM algorithm, which is much faster to converge than the classical Monte Carlo EM algorithm and Bayesian estimation procedures, does not require specification of prior distributions and is quite robust to the choice of starting values. The key idea is to recycle the simulated values from one iteration to the next in the EM algorithm, which considerably accelerates the convergence. A simulation study is presented which confirms the advantages of this estimation procedure in the case of a genetic analysis. The SAEM algorithm was applied to real data sets on growth measurements in beef cattle and in chickens. The proposed estimation procedure, as the classical Monte Carlo EM algorithm, provides significance tests on the parameters and likelihood based model comparison criteria to compare the nonlinear models with other longitudinal methods

    Detection and modelling of time-dependent QTL in animal populations

    Get PDF
    A longitudinal approach is proposed to map QTL affecting function-valued traits and to estimate their effect over time. The method is based on fitting mixed random regression models. The QTL allelic effects are modelled with random coefficient parametric curves and using a gametic relationship matrix. A simulation study was conducted in order to assess the ability of the approach to fit different patterns of QTL over time. It was found that this longitudinal approach was able to adequately fit the simulated variance functions and considerably improved the power of detection of time-varying QTL effects compared to the traditional univariate model. This was confirmed by an analysis of protein yield data in dairy cattle, where the model was able to detect QTL with high effect either at the beginning or the end of the lactation, that were not detected with a simple 305 day model

    Reverse engineering gene regulatory networks using approximate Bayesian computation

    Full text link
    Gene regulatory networks are collections of genes that interact with one other and with other substances in the cell. By measuring gene expression over time using high-throughput technologies, it may be possible to reverse engineer, or infer, the structure of the gene network involved in a particular cellular process. These gene expression data typically have a high dimensionality and a limited number of biological replicates and time points. Due to these issues and the complexity of biological systems, the problem of reverse engineering networks from gene expression data demands a specialized suite of statistical tools and methodologies. We propose a non-standard adaptation of a simulation-based approach known as Approximate Bayesian Computing based on Markov chain Monte Carlo sampling. This approach is particularly well suited for the inference of gene regulatory networks from longitudinal data. The performance of this approach is investigated via simulations and using longitudinal expression data from a genetic repair system in Escherichia coli.Comment: 16 pages, 11 figure